Taking the stress out of your code mess

Rhian Davies | @statsRhian

About Me 👋

Cartoon of a woman holding out a book

About Jumping Rivers

  • Data science & machine learning
  • Training courses
  • Dashboard development and deployment
  • Infrastructure
  • Managed Posit services

Cartoon of three people working at computers

I’m going to tell you a story

Meet Jane

  • Environmental scientist
  • Specialises in carbon models
  • Comfortable using R in academic setting

A cartoon robot holding a testtube and wearing a lab coat

Jane was frustrated

  • Inherited a pile of messy R code
  • Deeply nested
  • Responsible for getting it to work, fast
  • My Matryoshka doll code

Seven, traditional wooden Russian Dolls doing from largest on the left, to smallest on the right.

The solution

  • A series of 1:1 bespoke coding sessions
  • Unnesting the code, one doll at a time

A cartoon robot holding a testtube and wearing a lab coat

The messy zone

Define the messy zone

Throughout the refactoring process, it was essential that we were able to continuously run and test the functionality of the package. But how could we rewrite all of our functions without affecting the functionality of the whole code base?

  • Start with the smallest functions first
  • Re-write the main body
  • Clearly defined messy zones at the start and the end

Jumping Rivers robot using a vacuum cleaner on a pile of code and text.

Define the messy zone: benefits

  • Being explicit about where the mess is allowed us to focus on simplifying and clarifying the internals
  • Higher level code functioned as expected
  • Didn’t have to commit to the structure of the function parameters initially
  • We could work that out naturally as the code evolved
  • Clear markers of where we would need to clean up later, - Avoided the issue of forgetting to change parameters everywhere

Define the messy zone example

example = function(arg1, arg2) {
# Messy zone

a_better_name = arg1$mess$ugh
helpful_name = arg2$what$is$this

# Refactor the internals

useful_result = a_better_name + helpful_name
sensible_name_tibble = a_better_name * helpful_name    

# Messy Zone
results$some$mess = useful_result
results$another$naff$list = sensible_name_tibble

}

Push the mess up

Once the inner functions are clean

  1. List all the arguments of inners
  2. List all the returns of the inner
  3. Double check names
  4. Clear the messy zone in one go
  5. Move up one level
  6. Repeat

Salt N Pepa Push It Real Good GIF

Quick tips

Start with a blank slate

  • Avoid the temptation to copy-paste
  • Can tie you to the old style

Take time to design

Jane and I had one session where we didn’t touch any code at all. We talked, doodled and drew diagrams. You might leave a session like this feeling a little deflated that you didn’t achieve anything. However, that session was actually the most valuable. In the following session we made huge progress, because we had already done the hard work of thinking out the design fully. We were able to whizz through the functions, implementing our new design efficiently. We found ourselves constantly referring back to the diagrams to remind ourselves of the design choices we had made.

My favourite tools:

A good name goes a long way

We all know that it’s important to choose good names for parameters and functions. However, in this project, I was surprised just how much of a difference a good name makes. Sometimes, the only thing we would change in a function would be the names. Often a simple rename morphed the unintelligible code in front of me into a clear, readable explanation of the approach.

# Example

Test regularly

  • Things will go wrong

  • Deleted code

  • Brackets

  • You want to know as soon as possible

  • Ensure that your code is always run-able

  • Check your code still works by running your tests - {testthat}

  • Statistical models must ensure numerical results are unaffected by the refactor

  • CI/CD

Why rather than How

The code was initially what I would call “How” programming. The different components of the functions were grouped by how the calculations were computed programmatically rather than why we were calculating them. This made it hard for someone new to the code to understand what each function did.

I’m not an environmental scientist, so I don’t understand all of the science behind Jane’s complex model. However, by asking questions about what she was trying to achieve, we re-grouped the different stages of each function in terms of the science, rather than the implementation. Changing focus of the code to the scientific method made it much clearer to follow.

Do it with a friend

  • Refactoring can be daunting
  • Lot’s of moving parts
  • Hard to hold the overall design and small technical details simulatenously
  • The person helping you doesn’t have to understand the code in detail, in fact sometimes it helps if they don’t!
  • More fun (joint celebration)
  • Learn stuff

Cartoon of  four people sat around a table, with laptops. One of them is pointing at a projector screen with the python logo on it.

Recap

Top tips

  1. Define the messy zone

  2. Push the mess up

  3. Start with a blank slate

  4. Take time to design

  5. A good name goes a long way

  6. Test regularly

  7. Why rather than How

  8. Do it with a friend

Questions?